import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%load_ext watermark
%watermark --author "Ryan Sloot |" -d -v
$\qquad$For event $A$: $\ log_2\frac{1}{p(A)}$
$\qquad H(X) = \sum_{x\in X}p_X(x)\cdot log_2\frac{1}{p_X(x)}$
entropy = lambda p: np.sum(p * np.log2(1 / p))
## three loteries {amount gained: prob}
L1 = {-1: 999999/1000000, 9999: 1/1000000}
L2 = {-1: 999999/1000000, 999999: 1/1000000}
L3 = {-1: 9/10, 9: 1/10}
## compute entropy in bits for each Lottery
H_l1 = entropy(np.array([prob for prob in L1.values()]))
H_l2 = entropy(np.array([prob for prob in L2.values()]))
H_l3 = entropy(np.array([prob for prob in L3.values()]))
print('Entropies (in bits):\nL1: %.7f\nL2: %.7f\nL3: %.7f' % (H_l1,H_l2,H_l3))
For a random variable $X$ that takes on one of two values, one with probability $p$ and the other with probability $1−p$, plotting the entropy $H(p)$ as a function of $p$.
p_list = np.linspace(0, 1, 50)
hs = np.array([entropy(np.array([p, 1-p])) for p in p_list])
plt.plot(p_list, hs)
# max_point = plt.plot(p_list[np.argmax(hs)], hs.max(b), 'ro')
# plt.text(p_list[np.argmax(hs)], hs.max(),
# ' Max Entropy: %.7f\n P: %.2f' % (hs.max(), p_list[np.argmax(hs)+1]))
plt.ylim(0,1.05)
plt.xlabel('p')
plt.ylabel('H(Ber(p))')
plt.show()
$\qquad D(p\parallel q) = \sum_x p(x)\cdot log_2\frac{1}{q(x)} - \sum_x p(x)\cdot log_2\frac{1}{p(x)}$
$\qquad \qquad \ \ \ \ \ = \mathbb E_{X\sim P}[log_2\frac{p(x)}{q(x)}]$
$\qquad I(X;Y) = D(p_{X,Y} \parallel p_Xp_Y)$
## joint space
joint_prob_XY = np.array([[0.10, 0.09, 0.11],
[0.08, 0.07, 0.07],
[0.18, 0.13, 0.17]])
## Marginalize px and py
prob_X = joint_prob_XY.sum(axis=1)
prob_Y = joint_prob_XY.sum(axis=0)
## joint probability IF X and Y were independent
joint_prob_XY_indep = np.outer(prob_X, prob_Y)
joint_prob_XY_indep
Mutual information of $X$ and $Y$ is given by divergences between $p_{X,Y}$ and $p_Xp_Y$:
$\qquad I(X;Y) = D(p_{X,Y}\parallel p_{X}p_{Y}) = \sum _ x \sum _ y p_{X, Y}(x, y) \log _2 \frac{p_{X, Y}(x, y)}{p_ X(x) p_ Y(y)}.$
Divergence Generally:
$\qquad D(p\parallel q)=\sum_x p(x) log_2 \frac{p(x)}{q(x)}$
info_divergence = lambda x,y: np.sum(x * np.log2(x/y))
mutual_info_XY = info_divergence(joint_prob_XY,
joint_prob_XY_indep)
mutual_info_XY
Suppose we have three r.v.s $S,C,D$, where both $C$ and $D$ are dependent on $S$ we know:
$\qquad p_{C\ |\ S}(c\ |\ s) = 1\ /\ (2s+1)$ for $c \in \{ 0,1,\ldots ,2s\}$
and $D$ is $binom(q,S)$:
$\qquad \begin{eqnarray}
p_{D\ |\ S}(d\ |\ s) &= \begin{cases} {s \choose d}\, q^d\, (1-q)^{s-d} & d \in \{0,\ldots,s\} \\
0 & \text{otherwise} \end{cases}
\end{eqnarray}$
s = np.array([1,2,3,4])
p_S = [.25, .25, .25, .25]
cs = np.array([i for i in range(2*len(s))])
p_CS = np.zeros((len(cs),len(s)))
for i in range(len(s)):
for c in cs:
p_CS[c,i] = (1/(2*s[i]+1))
E_C_given_S=((p_CS*p_S).sum(axis=1)*cs).sum()
p_C = p_CS.sum(axis=1)
E_C = (p_C*cs).sum()
import scipy.stats
def pmf_DS (p,d):
s = [1,2,3,4]
p_ds = []
for i in s:
p_ds.append(scipy.stats.binom(i,p).pmf(d))
return np.array(p_ds)
pmf_DS(.2,1)
Expeceted number of rolls to see two consecutive 6s with a fair die
## Geometric dist
p = 1/6
one_six_expected = 1/p #E[X]=1/p
print('%d rolls to see one six' % one_six_expected)
so 6 rolls plus one additional roll: to get two consectutive, would give us 7 total per sequence; which would take on average one_six_expected, 6, to see again:
consecutive_sixes = 1/p*(one_six_expected+1)
print('%.f rolls (on average) to see two consecutive sixes' % consecutive_sixes)
Entropy and information divergence come up often in probabalistic modeling, especially when choosing maximum likelihood to decide which model to use. Information divergence will tell us how far a candidate model is from the observed data. Mutual information will help us figure out which r.v.s we should directly model pairwise interactions with$-$based on whether the information gained between the two is reason to include.